Take-home Exercise 3: Modelling Geography of Financial Inclusion with Geographically Weighted Methods

Author

Leng Yi Xiu

Published

November 2, 2024

Modified

November 12, 2024

1 Overview

Tanzania is one of the Africa’s fastest growing economies with nearly 7 percent annual national GDP growth since 2000. Despite the robust economic growth, widespread poverty persists. Approximately 49% of Tanzania’s population lives below the international extreme poverty line. This could be due to the lack of financial inclusion - the availability and equality of opportunities to access financial services such as banking, loan, equity and insurance products.

1.1 Objective

The objective of this exercise is to build an exploratory model that determines factors affecting financial inclusion by using non-spatial regression and geographically weighted regression methods.

Examples of independent factors include demographics (rural vs urban), marital status, gender, financial education, access to technology etc.

Examples of dependent factors include savings, financial situation, free access to banks, insurance coverage etc.


2 Getting Started

The code chunk below installs and loads olsrr, corrplot, ggpubr, sf, spdep, GWmodel, tmap, tidyverse, gtsummary packages into R environment:

pacman::p_load(olsrr, corrplot, ggpubr, sf, spdep, GWmodel, tmap, tidyverse, gtsummary)

3 Data Preparation

For the purpose of this exercise, we will be using two data sets:

  1. District level boundary GIS data for Tanzania

  2. FinScope Tanzania 2023 Survey results

3.1 Importing data into R Environment

GB = st_read(dsn = "data/rawdata/geospatial", layer = "geoBoundaries-TZA-ADM2")
Reading layer `geoBoundaries-TZA-ADM2' from data source 
  `D:\2. SMU - MITB\Term 4\ISSS626 Geospatial\YX-Leng\ISSS626-Geospatial\Take-home_Ex\Take-home_Ex03\data\rawdata\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 170 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 29.58953 ymin: -11.76235 xmax: 40.44473 ymax: -0.983143
Geodetic CRS:  WGS 84

πŸ’‘ Updating CRS information

st_crs(GB) <- 4210
Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
GB <- st_transform(GB, crs = 32736)
head(GB, 5)
Simple feature collection with 5 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 694340.1 ymin: 9563538 xmax: 992833.3 ymax: 9762474
Projected CRS: WGS 84 / UTM zone 36S
     shapeName shapeISO                 shapeID shapeGroup shapeType
1       Arusha     <NA> 72390352B32479700182608        TZA      ADM2
2 Arusha Urban     <NA> 72390352B90906351205470        TZA      ADM2
3       Karatu     <NA> 72390352B22674606658861        TZA      ADM2
4      Longido     <NA> 72390352B95731720096997        TZA      ADM2
5         Meru     <NA> 72390352B99598192663387        TZA      ADM2
                        geometry
1 MULTIPOLYGON (((929196 9602...
2 MULTIPOLYGON (((918731.5 96...
3 MULTIPOLYGON (((814868.8 96...
4 MULTIPOLYGON (((873372.8 96...
5 MULTIPOLYGON (((929196 9602...
Tanz_raw = read_csv("data/rawdata/aspatial/FinScope Tanzania 2023_Individual Main Data_FINAL.csv")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 9915 Columns: 721
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (703): reg_name, dist_name, ward_code1, ward_name, ea_code, clustertype,...
dbl  (13): SN, reg_code, dist_code, c8c, D6_1_1, D6_1_2, D6_1_3, gov_3, cmg4...
lgl   (5): e_5_1, e_5_2, g_5_2__5, g_5_2__13, serv2_4

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Tanz_raw, 5)
# A tibble: 5 Γ— 721
     SN reg_name      reg_code dist_code dist_name ward_code1 ward_name ea_code
  <dbl> <chr>            <dbl>     <dbl> <chr>     <chr>      <chr>     <chr>  
1  4529 Mwanza              19         7 Misungwi  251        Mondo     004    
2  4245 Kagera              18         7 Missenyi  011        Kakunyu   001    
3  8149 Mbeya               12         3 Kyela     283        Nkuyu     301    
4  6763 Dodoma               1         3 Kongwa    123        Kibaigwa  301    
5  7805 Dar es Salaam        7         2 Ilala     252        Majohe    029    
# β„Ή 713 more variables: clustertype <chr>, c1 <chr>, c2 <chr>, c7 <chr>,
#   c8c <dbl>, c9 <chr>, c10 <chr>, c11 <chr>, c8n_a1 <chr>, c8n_a2 <chr>,
#   c8n_b1 <chr>, c8n_b2 <chr>, c8n_c1 <chr>, c8n_c2 <chr>, c8n_d1 <chr>,
#   c8n_d2 <chr>, c8n_e1 <chr>, c8n_e2 <chr>, c8n_f1 <chr>, c8n_f2 <chr>,
#   c12_1 <chr>, c12_2 <chr>, c12_3 <chr>, c12_4 <chr>, c22 <chr>, c14 <chr>,
#   c15 <chr>, C16 <chr>, c16_unit <chr>, c17_1__1 <chr>, c17_1__2 <chr>,
#   c17_1__3 <chr>, c17_1__4 <chr>, c17_1__5 <chr>, c17_1__6 <chr>, …

πŸ’‘ Count number of unique dist_name

Tanz_raw %>%
  count(dist_name) %>%
  print()
# A tibble: 148 Γ— 2
   dist_name      n
   <chr>      <int>
 1 Arumeru      105
 2 Arusha        75
 3 Babati       105
 4 Bagamoyo      73
 5 Bahi          45
 6 Bariadi       75
 7 Biharamulo    45
 8 Buhigwe       45
 9 Bukoba        74
10 Bukombe       43
# β„Ή 138 more rows

3.2 Geospatial: Converting Multipolygon to Polygon

As the district level boundary GIS data in multipolygon form will cause overlays or intersections, it is more intuitive to calculate the area of each polygon and group them together by the unique district name and then select the largest polygon by area.

πŸ’‘ Calculate the area of each polygon

sf_GB <- GB %>%
  st_cast("POLYGON") %>%
  mutate(area = st_area(.))
Warning in st_cast.sf(., "POLYGON"): repeating attributes for all
sub-geometries for which they may not be constant

πŸ’‘ Group and select the largest polygon by area

GB_clean <- sf_GB %>%
  group_by(shapeName) %>%
  filter(area == max(area)) %>%
  ungroup() %>%
  select(-area) %>%
  select(shapeName)

3.3 Aspatial: Filtering columns for information required

As the FinScope Tanzania 2023 dataset contains 721 variables and not all variables will be used in our analysis, we will filter for the necessary independent and dependent variables based to analyse on:

(a) how cluster type, gender and access to technology affects emergency savings, and

(b) how cluster type, gender and access to technology affects borrowing.

Tanz_filter <- Tanz_raw %>%
  select(dist_name, clustertype, c8c, c9, c23__1,c23__2,,e_7_n_3,F4_1__3,F4_1__5,F4_1__8,F4_1__13,F4_1__14,
         g_2_3,g_5_1__1,g_5_1__2,g_5_1__3,g_5_1__5,g_5_1__6,g_5_1__7,
         g_5_1__8,g_5_1__9,g_5_1__10,g_5_1__11,g_5_1__12,g_5_1__13,
         ins3_1__3,ins3_1__9,Household_weight,population_wt)

3.4 Aspatial: Rename Columns

Tanz_filter <- Tanz_filter %>%
  rename(
    Age = c8c,
    Gender = c9, 
    Access_MobilePhone = c23__1,
    Access_Internet = c23__2,
    Savings_Emerg = e_7_n_3,
    Savings_SACCOS = F4_1__3,
    Savings_MobileWallet = F4_1__5,
    Savings_Group = F4_1__8,
    Savings_Cash = F4_1__13,
    Savings_Lifestock = F4_1__14,
    Borrow_Amount = g_2_3,
    Borrow_Bank = g_5_1__1,
    Borrow_MFI = g_5_1__2,
    Borrow_SACCOS = g_5_1__3,
    Borrow_Gov = g_5_1__5,
    Borrow_MM = g_5_1__6,
    Borrow_Pension = g_5_1__7,
    Borrow_Employer = g_5_1__8,
    Borrow_Fam_Pay = g_5_1__9,
    Borrow_Fam_NoPay = g_5_1__10,
    Borrow_SavGroup = g_5_1__11,
    Borrow_ML = g_5_1__12,
    Borrow_Religious = g_5_1__13,
    Insurance_Health = ins3_1__3,
    Insurance_Life = ins3_1__9
    )

3.5 Aspatial: Convert Yes/No to a numeric vector

Tanz_filter$clustertype <- as.numeric(Tanz_filter$clustertype == "Urban")
Tanz_filter$Gender <- as.numeric(Tanz_filter$Gender == "Female")
Tanz_filter$Access_MobilePhone <- as.numeric(Tanz_filter$Access_MobilePhone == "Yes")
Tanz_filter$Access_Internet <- as.numeric(Tanz_filter$Access_Internet == "Yes")
Tanz_filter$Savings_Emerg <- as.numeric(Tanz_filter$Savings_Emerg == "Yes")
Tanz_filter$Savings_SACCOS <- as.numeric(Tanz_filter$Savings_SACCOS == "Yes")
Tanz_filter$Savings_MobileWallet <- as.numeric(Tanz_filter$Savings_MobileWallet == "Yes")
Tanz_filter$Savings_Group <- as.numeric(Tanz_filter$Savings_Group == "Yes")
Tanz_filter$Savings_Cash <- as.numeric(Tanz_filter$Savings_Cash == "Yes")
Tanz_filter$Savings_Lifestock <- as.numeric(Tanz_filter$Savings_Lifestock == "Yes")
Tanz_filter$Borrow_Bank <- as.numeric(Tanz_filter$Borrow_Bank == "Yes")
Tanz_filter$Borrow_MFI <- as.numeric(Tanz_filter$Borrow_MFI == "Yes")
Tanz_filter$Borrow_SACCOS <- as.numeric(Tanz_filter$Borrow_SACCOS == "Yes")
Tanz_filter$Borrow_Gov <- as.numeric(Tanz_filter$Borrow_Gov == "Yes")
Tanz_filter$Borrow_MM <- as.numeric(Tanz_filter$Borrow_MM == "Yes")
Tanz_filter$Borrow_Pension <- as.numeric(Tanz_filter$Borrow_Pension == "Yes")
Tanz_filter$Borrow_Employer <- as.numeric(Tanz_filter$Borrow_Employer == "Yes")
Tanz_filter$Borrow_Fam_Pay <- as.numeric(Tanz_filter$Borrow_Fam_Pay == "Yes")
Tanz_filter$Borrow_Fam_NoPay <- as.numeric(Tanz_filter$Borrow_Fam_NoPay == "Yes")
Tanz_filter$Borrow_SavGroup <- as.numeric(Tanz_filter$Borrow_SavGroup == "Yes")
Tanz_filter$Borrow_ML <- as.numeric(Tanz_filter$Borrow_ML == "Yes")
Tanz_filter$Borrow_Religious <- as.numeric(Tanz_filter$Borrow_Religious == "Yes")
Tanz_filter$Insurance_Health <- as.numeric(Tanz_filter$Insurance_Health == "Yes")
Tanz_filter$Insurance_Life <- as.numeric(Tanz_filter$Insurance_Life == "Yes")
Tanz_filter$Borrow_Amount <- ifelse(is.na(Tanz_filter$Borrow_Amount), 0, Tanz_filter$Borrow_Amount)
Tanz_filter$Borrow_Amount <- as.numeric(Tanz_filter$Borrow_Amount)
Warning: NAs introduced by coercion
Tanz_filter[is.na(Tanz_filter)] <- 0
head(Tanz_filter,5)
# A tibble: 5 Γ— 29
  dist_name clustertype   Age Gender Access_MobilePhone Access_Internet
  <chr>           <dbl> <dbl>  <dbl>              <dbl>           <dbl>
1 Misungwi            0    47      1                  1               1
2 Missenyi            0    63      1                  0               0
3 Kyela               1    74      0                  1               0
4 Kongwa              1    29      1                  1               0
5 Ilala               1    53      0                  1               0
# β„Ή 23 more variables: Savings_Emerg <dbl>, Savings_SACCOS <dbl>,
#   Savings_MobileWallet <dbl>, Savings_Group <dbl>, Savings_Cash <dbl>,
#   Savings_Lifestock <dbl>, Borrow_Amount <dbl>, Borrow_Bank <dbl>,
#   Borrow_MFI <dbl>, Borrow_SACCOS <dbl>, Borrow_Gov <dbl>, Borrow_MM <dbl>,
#   Borrow_Pension <dbl>, Borrow_Employer <dbl>, Borrow_Fam_Pay <dbl>,
#   Borrow_Fam_NoPay <dbl>, Borrow_SavGroup <dbl>, Borrow_ML <dbl>,
#   Borrow_Religious <dbl>, Insurance_Health <dbl>, Insurance_Life <dbl>, …

3.6 Aspatial: Replace dist_name to avoid empty polygons

Assumptions

For the dist_name that cannot match to the GIS data for Tanzania, it will be replaced by the values of its Mother District (for coast) or the nearest city.

Tanz_join <- Tanz_filter %>%
  mutate(dist_name = case_when(
    dist_name == "Tanganyika" ~ "Tanga Urban",
    dist_name == "Kigamboni" ~ "Temeke",
    dist_name == "Arumeru" & clustertype == "1" ~ "Arusha Urban",
    dist_name == "Arumeru" & clustertype == "0" ~ "Arusha",
    dist_name == "Butiama" ~ "Butiam",
    dist_name == "Dodoma" ~ "Dodoma Urban",
    dist_name == "Tanga" ~ "Tanga Urban",
    dist_name == "Malinyi" ~ "Morogoro",
    dist_name == "Magharibi B"~ "Magharibi",
    dist_name == "Magharibi A"~ "Magharibi",
    dist_name == "Ubungo"~ "Kinondoni",
    dist_name == "Tabora"~ "Tabora Urban",
    dist_name == "Kibiti"~ "Mkuranga",
    TRUE ~ dist_name  # Keep all other values unchanged
  ))

3.7 Perform relational join

Tanz_final <- left_join(Tanz_join,GB_clean,
                  by = c("dist_name" = "shapeName"), relationship = "many-to-many")

πŸ’‘ Check for any empty polygons and class

empty_polygons <- Tanz_final %>%
  filter(st_is_empty(geometry)) %>%
  select(dist_name) %>%
  distinct()
class(Tanz_final)
[1] "tbl_df"     "tbl"        "data.frame"

3.8 Writing, reading data file to rds

write_rds(Tanz_final, "data/rds/Tanz_final.rds")
Tanz_final <- read_rds("data/rds/Tanz_final.rds")

4 Non-Spatial Regression Methods

4.1 Check for multi-collinearity

Tanz_corr <- Tanz_final %>%
  st_drop_geometry()
corrplot::corrplot(cor(Tanz_corr[,2:27]), 
                   diag = FALSE, 
                   order = "AOE",
                   tl.pos = "td", 
                   tl.cex = 0.5, 
                   method = "number", 
                   type = "upper")

Observations

Since all the correlation values are < 0.8, there is no signs of multi-collinearity.

4.2 Build a non-spatial multi-linear regression

4.2.1 How is borrowing amount affected by various independent variables?

## need to log the borrowing amount to normalize extreme values
Tanz_final$Borrow_Amount <- log(Tanz_final$Borrow_Amount + 1)
Borrowing_mlr <- lm(
  Borrow_Amount ~ clustertype + Age + Gender + Access_MobilePhone + Access_Internet + Borrow_Bank + Borrow_MFI + 
  Borrow_SACCOS + Borrow_Gov + Borrow_MM + Borrow_Pension + Borrow_Employer + Borrow_Fam_Pay + Borrow_Fam_NoPay +  
  Borrow_SavGroup + Borrow_ML + Borrow_Religious,
  data=Tanz_final)

summary(Borrowing_mlr)

Call:
lm(formula = Borrow_Amount ~ clustertype + Age + Gender + Access_MobilePhone + 
    Access_Internet + Borrow_Bank + Borrow_MFI + Borrow_SACCOS + 
    Borrow_Gov + Borrow_MM + Borrow_Pension + Borrow_Employer + 
    Borrow_Fam_Pay + Borrow_Fam_NoPay + Borrow_SavGroup + Borrow_ML + 
    Borrow_Religious, data = Tanz_final)

Residuals:
    Min      1Q  Median      3Q     Max 
-41.226  -0.853  -0.566   0.660  17.137 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         0.374761   0.099271   3.775 0.000161 ***
clustertype         0.095177   0.055219   1.724 0.084806 .  
Age                 0.002801   0.001508   1.858 0.063193 .  
Gender             -0.314786   0.050635  -6.217 5.28e-10 ***
Access_MobilePhone  0.388865   0.072849   5.338 9.61e-08 ***
Access_Internet     0.188531   0.058460   3.225 0.001264 ** 
Borrow_Bank         8.752231   0.226625  38.620  < 2e-16 ***
Borrow_MFI          9.294665   0.192160  48.369  < 2e-16 ***
Borrow_SACCOS       8.621757   0.359141  24.007  < 2e-16 ***
Borrow_Gov          6.495580   0.570177  11.392  < 2e-16 ***
Borrow_MM           2.848696   0.199602  14.272  < 2e-16 ***
Borrow_Pension     -3.805797   1.114491  -3.415 0.000641 ***
Borrow_Employer     3.609994   0.265147  13.615  < 2e-16 ***
Borrow_Fam_Pay      8.767819   0.057526 152.414  < 2e-16 ***
Borrow_Fam_NoPay    0.842179   0.215213   3.913 9.17e-05 ***
Borrow_SavGroup     8.695528   0.099584  87.318  < 2e-16 ***
Borrow_ML           7.036817   0.164615  42.747  < 2e-16 ***
Borrow_Religious    8.537908   1.007762   8.472  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.467 on 9897 degrees of freedom
Multiple R-squared:  0.7967,    Adjusted R-squared:  0.7963 
F-statistic:  2281 on 17 and 9897 DF,  p-value: < 2.2e-16
Observations

Adjusted R-squared of ~80% (0.7963) indicates a strong model fit and no overfitting.

The model shows that the source of borrowing strongly influenced the borrowing amounts. Tanzanians tend to borrow from micro-finance institutions (MFI) and family/religious groups, but less from Pension (negative estimate). The negative estimate for gender also indicates that males borrow less than females (one-hot encoding female = 1).

The significance of access to mobile phones to borrowing amount suggests that financial accessibility may be linked to mobile connectivity.

ols_vif_tol(Borrowing_mlr)
            Variables Tolerance      VIF
1         clustertype 0.9039700 1.106231
2                 Age 0.9743801 1.026293
3              Gender 0.9701335 1.030786
4  Access_MobilePhone 0.9324815 1.072407
5     Access_Internet 0.8764393 1.140980
6         Borrow_Bank 0.9751965 1.025434
7          Borrow_MFI 0.9861093 1.014086
8       Borrow_SACCOS 0.9874746 1.012684
9          Borrow_Gov 0.9868489 1.013326
10          Borrow_MM 0.9760810 1.024505
11     Borrow_Pension 0.9801367 1.020266
12    Borrow_Employer 0.9811711 1.019190
13     Borrow_Fam_Pay 0.9617857 1.039733
14   Borrow_Fam_NoPay 0.9584516 1.043350
15    Borrow_SavGroup 0.9833790 1.016902
16          Borrow_ML 0.9826744 1.017631
17   Borrow_Religious 0.9990478 1.000953
ols_plot_resid_fit(Borrowing_mlr)

ols_plot_resid_hist(Borrowing_mlr)

mlr.output <- as.data.frame(Borrowing_mlr$residuals) 
Borrowing.res.sf <- cbind(Tanz_final, 
                          MLR_RES = Borrowing_mlr$residuals)
Borrowing.sf <- st_as_sf(Borrowing.res.sf)
Borrowing.sp <- as_Spatial(Borrowing.sf)
Borrowing.sp
class       : SpatialPolygonsDataFrame 
features    : 9915 
extent      : 122650.6, 1317290, 8697326, 9890957  (xmin, xmax, ymin, ymax)
crs         : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
variables   : 30
names       : dist_name, clustertype, Age, Gender, Access_MobilePhone, Access_Internet, Savings_Emerg, Savings_SACCOS, Savings_MobileWallet, Savings_Group, Savings_Cash, Savings_Lifestock,    Borrow_Amount, Borrow_Bank, Borrow_MFI, ... 
min values  :    Arusha,           0,  16,      0,                  0,               0,             0,              0,                    0,             0,            0,                 0,                0,           0,          0, ... 
max values  :      Wete,           1, 100,      1,                  1,               1,             1,              1,                    1,             1,            1,                 1, 18.4206807539524,           1,          1, ... 
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(GB_clean)+
  tmap_options(check.and.fix = TRUE) +
  tm_polygons(alpha = 0.4) +
tm_shape(Borrowing.sp) +
  tm_dots(col = "MLR_RES",
          alpha = 0.6,
          style = "quantile")
Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
tmap_mode("plot")
tmap mode set to plotting

4.2.2 How is emergency savings affected by various independent variables?

Savings_mlr <- lm(
  Savings_Emerg ~ clustertype + Age + Gender + Access_MobilePhone + Access_Internet + Savings_SACCOS + 
  Savings_MobileWallet + Savings_Group + Savings_Cash + Savings_Lifestock + Insurance_Health + Insurance_Life,
  data=Tanz_final)

summary(Savings_mlr)

Call:
lm(formula = Savings_Emerg ~ clustertype + Age + Gender + Access_MobilePhone + 
    Access_Internet + Savings_SACCOS + Savings_MobileWallet + 
    Savings_Group + Savings_Cash + Savings_Lifestock + Insurance_Health + 
    Insurance_Life, data = Tanz_final)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.27112 -0.45711  0.05265  0.49325  0.78906 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           0.3405302  0.0188013  18.112  < 2e-16 ***
clustertype           0.0541882  0.0105825   5.121 3.10e-07 ***
Age                  -0.0008932  0.0002887  -3.094 0.001982 ** 
Gender               -0.0438508  0.0096768  -4.532 5.92e-06 ***
Access_MobilePhone    0.1719591  0.0139508  12.326  < 2e-16 ***
Access_Internet       0.1491384  0.0111630  13.360  < 2e-16 ***
Savings_SACCOS        0.2601189  0.0695143   3.742 0.000184 ***
Savings_MobileWallet  0.3179673  0.0300478  10.582  < 2e-16 ***
Savings_Group         0.3361823  0.0205502  16.359  < 2e-16 ***
Savings_Cash          0.3173420  0.0840223   3.777 0.000160 ***
Savings_Lifestock     0.2102591  0.0464044   4.531 5.94e-06 ***
Insurance_Health      0.0601241  0.0840342   0.715 0.474335    
Insurance_Life        0.2288977  0.2121822   1.079 0.280713    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4741 on 9902 degrees of freedom
Multiple R-squared:  0.1005,    Adjusted R-squared:  0.09937 
F-statistic: 92.16 on 12 and 9902 DF,  p-value: < 2.2e-16
Observations

Very low p-value (< 2.2e-16) indicating that the model is statistically significant.

  • Gender: The negative estimate (-0.04, p < 0.001) suggests that males tend to save less for emergencies.

  • Age: There’s a slight but significant negative effect of age on emergency savings, possibly indicating a reduced focus on emergency savings as age increases.

  • Insurance (Health and Life): high p-values indicative that there is no significant relationship with emergency savings.

The low R-squared value suggests that emergency savings might be influenced by factors not captured in this model, such as level of inome, financial literacy, or household characteristics. Further exploration with additional variables could provide more insights into determinants of emergency savings.

ols_vif_tol(Savings_mlr)
              Variables Tolerance      VIF
1           clustertype 0.9093048 1.099741
2                   Age 0.9816423 1.018701
3                Gender 0.9813460 1.019009
4    Access_MobilePhone 0.9393853 1.064526
5       Access_Internet 0.8880461 1.126068
6        Savings_SACCOS 0.9944206 1.005611
7  Savings_MobileWallet 0.9907365 1.009350
8         Savings_Group 0.9891101 1.011010
9          Savings_Cash 0.9981986 1.001805
10    Savings_Lifestock 0.9953986 1.004623
11     Insurance_Health 0.9979175 1.002087
12       Insurance_Life 0.9990424 1.000959
ols_plot_resid_fit(Savings_mlr)

ols_plot_resid_hist(Savings_mlr)

mlr.output <- as.data.frame(Savings_mlr$residuals) 
Savings.res.sf <- cbind(Tanz_final, 
                          MLR_RES = Savings_mlr$residuals)
Savings.sf <- st_as_sf(Savings.res.sf)
Savings.sp <- as_Spatial(Savings.sf)
Savings.sp
class       : SpatialPolygonsDataFrame 
features    : 9915 
extent      : 122650.6, 1317290, 8697326, 9890957  (xmin, xmax, ymin, ymax)
crs         : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
variables   : 30
names       : dist_name, clustertype, Age, Gender, Access_MobilePhone, Access_Internet, Savings_Emerg, Savings_SACCOS, Savings_MobileWallet, Savings_Group, Savings_Cash, Savings_Lifestock,    Borrow_Amount, Borrow_Bank, Borrow_MFI, ... 
min values  :    Arusha,           0,  16,      0,                  0,               0,             0,              0,                    0,             0,            0,                 0,                0,           0,          0, ... 
max values  :      Wete,           1, 100,      1,                  1,               1,             1,              1,                    1,             1,            1,                 1, 18.4206807539524,           1,          1, ... 
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(GB_clean)+
  tmap_options(check.and.fix = TRUE) +
  tm_polygons(alpha = 0.4) +
tm_shape(Savings.sp) +
  tm_dots(col = "MLR_RES",
          alpha = 0.6,
          style = "quantile")
Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
tmap_mode("plot")
tmap mode set to plotting

5 Geographically Weighted Regression Methods

Geographically weighted regression (GWR) is a spatial statistical technique that takes non-stationary variables into consideration and models the local relationships between these independent variables and an outcome of interest / dependent variable.

5.1 Building Adaptive Bandwidth GWR Model for Borrowing

bw.adaptive <- bw.gwr(formula = Borrow_Amount ~ clustertype + Age + Gender + Access_MobilePhone + Access_Internet + Borrow_Bank + Borrow_MFI + 
                     Borrow_SACCOS + Borrow_Gov + Borrow_MM + Borrow_Pension + Borrow_Employer + Borrow_Fam_Pay + Borrow_Fam_NoPay +  
                     Borrow_SavGroup + Borrow_ML + Borrow_Religious, 
                     data=Borrowing.sp, 
                     approach="CV", 
                     kernel="gaussian", 
                     adaptive=TRUE, 
                     longlat=FALSE)
Take a cup of tea and have a break, it will take a few minutes.
          -----A kind suggestion from GWmodel development group
Adaptive bandwidth: 6135 CV score: 61890.33 
Adaptive bandwidth: 3800 CV score: 61106.91 
Adaptive bandwidth: 2355 CV score: 60490.58 
Adaptive bandwidth: 1464 CV score: 59889.21 
Adaptive bandwidth: 911 CV score: 59525.29 
Adaptive bandwidth: 571 CV score: 59731.2 
Adaptive bandwidth: 1122 CV score: 59589.12 
Adaptive bandwidth: 781 CV score: 59524.28 
Adaptive bandwidth: 700 CV score: 59731.76 
Adaptive bandwidth: 830 CV score: 59560.45 
Adaptive bandwidth: 749 CV score: 59536.42 
Adaptive bandwidth: 799 CV score: 59513.08 
Adaptive bandwidth: 812 CV score: 59556.3 
Adaptive bandwidth: 793 CV score: 59521.57 
Adaptive bandwidth: 805 CV score: 59532.43 
Adaptive bandwidth: 797 CV score: 59513.08 

πŸ’‘Calibrating the GWR model using adaptive bandwidth and gaussian kernel

gwr.adaptive <- gwr.basic(formula = Borrow_Amount ~ clustertype + Age + Gender + Access_MobilePhone + Access_Internet + Borrow_Bank + Borrow_MFI + 
                      Borrow_SACCOS + Borrow_Gov + Borrow_MM + Borrow_Pension + Borrow_Employer + Borrow_Fam_Pay + Borrow_Fam_NoPay +  
                      Borrow_SavGroup + Borrow_ML + Borrow_Religious, 
                      data=Borrowing.sp,
                      bw=bw.adaptive, 
                      kernel = 'gaussian', 
                      adaptive=TRUE, 
                      longlat = FALSE)
gwr.adaptive
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2024-11-12 16:39:37.82901 
   Call:
   gwr.basic(formula = Borrow_Amount ~ clustertype + Age + Gender + 
    Access_MobilePhone + Access_Internet + Borrow_Bank + Borrow_MFI + 
    Borrow_SACCOS + Borrow_Gov + Borrow_MM + Borrow_Pension + 
    Borrow_Employer + Borrow_Fam_Pay + Borrow_Fam_NoPay + Borrow_SavGroup + 
    Borrow_ML + Borrow_Religious, data = Borrowing.sp, bw = bw.adaptive, 
    kernel = "gaussian", adaptive = TRUE, longlat = FALSE)

   Dependent (y) variable:  Borrow_Amount
   Independent variables:  clustertype Age Gender Access_MobilePhone Access_Internet Borrow_Bank Borrow_MFI Borrow_SACCOS Borrow_Gov Borrow_MM Borrow_Pension Borrow_Employer Borrow_Fam_Pay Borrow_Fam_NoPay Borrow_SavGroup Borrow_ML Borrow_Religious
   Number of data points: 9915
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
    Min      1Q  Median      3Q     Max 
-41.226  -0.853  -0.566   0.660  17.137 

   Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
   (Intercept)         0.374761   0.099271   3.775 0.000161 ***
   clustertype         0.095177   0.055219   1.724 0.084806 .  
   Age                 0.002801   0.001508   1.858 0.063193 .  
   Gender             -0.314786   0.050635  -6.217 5.28e-10 ***
   Access_MobilePhone  0.388865   0.072849   5.338 9.61e-08 ***
   Access_Internet     0.188531   0.058460   3.225 0.001264 ** 
   Borrow_Bank         8.752231   0.226625  38.620  < 2e-16 ***
   Borrow_MFI          9.294665   0.192160  48.369  < 2e-16 ***
   Borrow_SACCOS       8.621757   0.359141  24.007  < 2e-16 ***
   Borrow_Gov          6.495580   0.570177  11.392  < 2e-16 ***
   Borrow_MM           2.848696   0.199602  14.272  < 2e-16 ***
   Borrow_Pension     -3.805797   1.114491  -3.415 0.000641 ***
   Borrow_Employer     3.609994   0.265147  13.615  < 2e-16 ***
   Borrow_Fam_Pay      8.767819   0.057526 152.414  < 2e-16 ***
   Borrow_Fam_NoPay    0.842179   0.215213   3.913 9.17e-05 ***
   Borrow_SavGroup     8.695528   0.099584  87.318  < 2e-16 ***
   Borrow_ML           7.036817   0.164615  42.747  < 2e-16 ***
   Borrow_Religious    8.537908   1.007762   8.472  < 2e-16 ***

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 2.467 on 9897 degrees of freedom
   Multiple R-squared: 0.7967
   Adjusted R-squared: 0.7963 
   F-statistic:  2281 on 17 and 9897 DF,  p-value: < 2.2e-16 
   ***Extra Diagnostic information
   Residual sum of squares: 60213.56
   Sigma(hat): 2.46459
   AIC:  46060.71
   AICc:  46060.79
   BIC:  36457.38
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Adaptive bandwidth: 797 (number of nearest neighbours)
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                             Min.     1st Qu.      Median     3rd Qu.    Max.
   Intercept           1.4317e-01  3.5045e-01  4.1365e-01  4.6078e-01  0.7614
   clustertype        -1.4891e-01 -1.5975e-02  3.9656e-02  1.9101e-01  0.4063
   Age                -8.7057e-04  7.8564e-04  1.9973e-03  4.0159e-03  0.0068
   Gender             -6.6714e-01 -3.9727e-01 -2.5455e-01 -2.0293e-01 -0.0347
   Access_MobilePhone  1.6506e-01  2.9360e-01  3.2302e-01  3.9375e-01  0.7621
   Access_Internet    -1.8643e-01 -7.3609e-02  6.0649e-02  2.0898e-01  0.6225
   Borrow_Bank         5.4351e+00  7.0067e+00  8.2255e+00  1.0070e+01 11.2152
   Borrow_MFI          7.3246e+00  9.1535e+00  9.5970e+00  9.9729e+00 10.7016
   Borrow_SACCOS       4.4478e+00  7.7405e+00  9.7792e+00  1.1108e+01 11.9250
   Borrow_Gov         -5.2991e+00  4.7055e+00  7.5962e+00  9.9634e+00 12.5339
   Borrow_MM           2.0950e+00  2.8234e+00  3.1301e+00  3.6069e+00  5.2954
   Borrow_Pension     -4.3570e+01 -3.3381e+01 -2.9732e-01  4.5420e+00  9.6900
   Borrow_Employer     3.5947e-01  2.4272e+00  5.1976e+00  8.2656e+00  9.4152
   Borrow_Fam_Pay      8.1504e+00  8.5064e+00  8.7685e+00  9.0348e+00  9.2714
   Borrow_Fam_NoPay   -1.9170e+00  3.2833e-01  1.4038e+00  3.7294e+00  5.1773
   Borrow_SavGroup     6.4485e+00  8.2138e+00  8.5398e+00  8.9293e+00 10.1296
   Borrow_ML           5.7152e+00  6.5983e+00  6.7792e+00  7.6926e+00  9.6076
   Borrow_Religious    1.6598e+00  4.4005e+00  9.4119e+00  1.1391e+01 13.0037
   ************************Diagnostic information*************************
   Number of data points: 9915 
   Effective number of parameters (2trace(S) - trace(S'S)): 158.3264 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 9756.674 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 44681.02 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 44559.79 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 35599.43 
   Residual sum of squares: 51347.3 
   R-square value:  0.8265969 
   Adjusted R-square value:  0.8237828 

   ***********************************************************************
   Program stops at: 2024-11-12 16:41:32.923033 
Observations

Coefficient Estimates:

  • Clustertype: Ranges from -0.149 to 0.406, showing a negative influence in some areas and positive in others. This suggests that certain clusters might either restrict or encourage borrowing differently based on location.

  • Gender: Consistently negative - indicative that across locations, males may have lower borrowing amounts than females, with the effect stronger in some areas.

  • Access to Mobile and Internet: Generally positive effects, suggesting that access to technology generally increases borrowing, although there are areas where internet access has a slight negative effect.

Borrowing Sources with Divergent Effects:

  • Borrow_Gov: Ranges widely from -5.30 to 12.53, showing that in some areas, government borrowing is linked to lower borrowing amounts, while in others, it has a high positive effect.

  • Borrow_Pension: Large variability (-43.57 to 9.69), suggesting that pension borrowing strongly decreases borrowing amounts in some areas while having a positive effect in others.

These results suggest that interventions aimed at influencing borrowing behavior could benefit from being spatially targeted. For example, policies could focus on enhancing mobile and internet access in regions where they are most effective or promoting specific borrowing sources depending on local impact.

5.2 Building Adaptive Bandwidth GWR Model for Emergency Savings

bw.adaptive2 <- bw.gwr(formula = Savings_Emerg ~ clustertype + Age + Gender + Access_MobilePhone + Access_Internet + Savings_SACCOS + 
                       Savings_MobileWallet + Savings_Group + Savings_Cash + Savings_Lifestock + Insurance_Health + Insurance_Life, 
                       data=Savings.sp, 
                       approach="CV", 
                       kernel="gaussian", 
                       adaptive=TRUE, 
                       longlat=FALSE)
Take a cup of tea and have a break, it will take a few minutes.
          -----A kind suggestion from GWmodel development group
Adaptive bandwidth: 6135 CV score: 2218.7 
Adaptive bandwidth: 3800 CV score: 2210.735 
Adaptive bandwidth: 2355 CV score: 2203.533 
Adaptive bandwidth: 1464 CV score: 2197.742 
Adaptive bandwidth: 911 CV score: 2190.838 
Adaptive bandwidth: 571 CV score: 2188.671 
Adaptive bandwidth: 359 CV score: 2186.766 
Adaptive bandwidth: 230 CV score: 2191.208 
Adaptive bandwidth: 441 CV score: 2186.659 
Adaptive bandwidth: 489 CV score: 2187.388 
Adaptive bandwidth: 408 CV score: 2187.146 
Adaptive bandwidth: 458 CV score: 2187.261 
Adaptive bandwidth: 427 CV score: 2186.801 
Adaptive bandwidth: 446 CV score: 2186.555 
Adaptive bandwidth: 453 CV score: 2187.262 
Adaptive bandwidth: 445 CV score: 2186.555 
gwr.adaptive2 <- gwr.basic(formula = Savings_Emerg ~ clustertype + Age + Gender + Access_MobilePhone + Access_Internet + Savings_SACCOS + 
                       Savings_MobileWallet + Savings_Group + Savings_Cash + Savings_Lifestock + Insurance_Health + Insurance_Life, 
                       data=Savings.sp,
                       bw=bw.adaptive, 
                       kernel = 'gaussian', 
                       adaptive=TRUE, 
                       longlat = FALSE)
gwr.adaptive2
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2024-11-12 16:52:32.341079 
   Call:
   gwr.basic(formula = Savings_Emerg ~ clustertype + Age + Gender + 
    Access_MobilePhone + Access_Internet + Savings_SACCOS + Savings_MobileWallet + 
    Savings_Group + Savings_Cash + Savings_Lifestock + Insurance_Health + 
    Insurance_Life, data = Savings.sp, bw = bw.adaptive, kernel = "gaussian", 
    adaptive = TRUE, longlat = FALSE)

   Dependent (y) variable:  Savings_Emerg
   Independent variables:  clustertype Age Gender Access_MobilePhone Access_Internet Savings_SACCOS Savings_MobileWallet Savings_Group Savings_Cash Savings_Lifestock Insurance_Health Insurance_Life
   Number of data points: 9915
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
     Min       1Q   Median       3Q      Max 
-1.27112 -0.45711  0.05265  0.49325  0.78906 

   Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
   (Intercept)           0.3405302  0.0188013  18.112  < 2e-16 ***
   clustertype           0.0541882  0.0105825   5.121 3.10e-07 ***
   Age                  -0.0008932  0.0002887  -3.094 0.001982 ** 
   Gender               -0.0438508  0.0096768  -4.532 5.92e-06 ***
   Access_MobilePhone    0.1719591  0.0139508  12.326  < 2e-16 ***
   Access_Internet       0.1491384  0.0111630  13.360  < 2e-16 ***
   Savings_SACCOS        0.2601189  0.0695143   3.742 0.000184 ***
   Savings_MobileWallet  0.3179673  0.0300478  10.582  < 2e-16 ***
   Savings_Group         0.3361823  0.0205502  16.359  < 2e-16 ***
   Savings_Cash          0.3173420  0.0840223   3.777 0.000160 ***
   Savings_Lifestock     0.2102591  0.0464044   4.531 5.94e-06 ***
   Insurance_Health      0.0601241  0.0840342   0.715 0.474335    
   Insurance_Life        0.2288977  0.2121822   1.079 0.280713    

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 0.4741 on 9902 degrees of freedom
   Multiple R-squared: 0.1005
   Adjusted R-squared: 0.09937 
   F-statistic: 92.16 on 12 and 9902 DF,  p-value: < 2.2e-16 
   ***Extra Diagnostic information
   Residual sum of squares: 2225.745
   Sigma(hat): 0.4738438
   AIC:  13352.97
   AICc:  13353.01
   BIC:  3667.619
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Adaptive bandwidth: 797 (number of nearest neighbours)
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                               Min.     1st Qu.      Median     3rd Qu.    Max.
   Intercept             0.17099309  0.24632582  0.34182781  0.40021000  0.4792
   clustertype          -0.09581006  0.01203795  0.05573308  0.08391108  0.1337
   Age                  -0.00304768 -0.00206891 -0.00103339  0.00022234  0.0020
   Gender               -0.08582796 -0.04839814 -0.03506907 -0.02368760 -0.0106
   Access_MobilePhone    0.08786736  0.15804272  0.17777684  0.22606676  0.3600
   Access_Internet       0.06711394  0.11846392  0.13445762  0.15505031  0.2256
   Savings_SACCOS        0.10866249  0.19687735  0.24050930  0.32311176  0.4540
   Savings_MobileWallet  0.19638212  0.25784342  0.30072857  0.38394570  0.4287
   Savings_Group         0.24534436  0.28275979  0.33304972  0.35441334  0.4304
   Savings_Cash          0.18572914  0.30648512  0.37018648  0.42097418  0.5555
   Savings_Lifestock    -0.24702905  0.09813798  0.19182483  0.24311827  0.3628
   Insurance_Health     -0.14961515 -0.04631745  0.03208975  0.11045633  0.2785
   Insurance_Life       -0.00122182  0.03197228  0.29657425  0.50091176  0.5423
   ************************Diagnostic information*************************
   Number of data points: 9915 
   Effective number of parameters (2trace(S) - trace(S'S)): 115.5799 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 9799.42 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 13201.2 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 13113.26 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 3890.787 
   Residual sum of squares: 2160.249 
   R-square value:  0.1269335 
   Adjusted R-square value:  0.116635 

   ***********************************************************************
   Program stops at: 2024-11-12 16:53:49.189879